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Abstract 

In this work we investigate the behavior of the distortion threshold that can be guaranteed in joint source-channel 
coding, to within a prescribed excess-distortion probability. We show that the gap between this threshold and the 
optimal average distortion is governed by a constant that we call the joint source-channel dispersion. This constant 
can be easily computed, since it is the sum of the source and channel dispersions, previously derived. The resulting 
performance is shown to be better than that of any separation-based scheme. For the proof, we use unequal error 
" l". protection channel coding, thus we also evaluate the dispersion of that setting. 

^ ' 

o : 

CN ■ I. Introduction 

^ ■ One of the most basic results of Information Theory, joint source-channel coding, due to Shannon [1], states that 
Q in the limit of large block-length n, a discrete memoryless source with distribution P can be sent through a discrete 
memoryless channel with transition distribution W and reconstructed with some expected average distortion D, as 
long as 

R{P,D)<pC{W), (1) 



where R{P,D) is the rate-distortion function of the source, C{W) is the channel capacity and the bandwidth 
, expansion ratio p is the number of channel uses per source sample. We denote by D* = D*{P, W, p) the distortion 

satisfying(l) with equality, known as the optimal performance theoretically attainable (OPTA). Beyond the expected 
^ distortion, one may be interested in ensuring that the distortion for one source block is below some threshold. To 
^ that end, we see an excess distortion event S{D) as 

~" £iD)^{d{S,S)> D}, (2) 

where 



cn 
o 
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d{s,s) = -y^d{si,Si) (3) 



^ is the distortion between the source and reproduction words s and s. 

^ We are interested in the probability of this event as a function of the block length. We note that two different 
^ ^ approaches can be taken. In the first, the distortion threshold is fixed to some D > D* and one considers how the 
■ - - ' excess-distortion probability e approaches zero as the block length n grows. This leads to the joint source-channel 
excess-distortion exponent: [2], [3] 

e{n)^exp{-n- E{P,W,p,D)}. (4) 

One may ask an alternative question: for given excess distortion probability e, let Dn be the optimal (minimal) 
distortion threshold that can be achieved at blocklength n. How does the sequence Dn approach D*7 In this work 
we show, that the sequence behaves as: 



R{P,Dn) = pC{W) - ^YilEl^Q-i^e), (5) 

where Q^^{-) is the inverse of the Gaussian cdf. We coin Vj{P,W, p) the joint source-channel coding (JSCC) 
dispersion. 

Similar problems have been stated and solved in the context of channel coding and lossless source coding in [4]. 
In [5] the channel dispersion result is tightened and extended, while in [6] (see also [7]) the parallel lossy source 
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coding result is derived. In source coding, the rate redundancy above the rate-distortion function (or entropy in the 
lossless case) is measured, for a given excess-distortion probability e: 



Rn - R{P,D) + \ r'^^'^^ Q-\e), (6) 
V n 

where Vs{P,D) is the source-coding dispersion. In channel coding, it is the rate gap below capacity, for a given 
error probability e: 



JM^Q-I(e), (7) 
V n 

where Vc'(Ty) is the channel-coding dispersion. We show that the JSCC dispersion is related to the source and 
channel dispersions by the following simple formula (subject to certain regularity conditions): 

Vj{P,W,p) = Vs{P,D*) + p-Vc{W). (8) 

The achievability proof of (8) is closely related to that of Csiszar for the exponent [3]. Namely, multiple source 
codebooks are mapped into an unequal error protection channel coding scheme. The converse proof combines the 
strong channel coding converse [8] with the D-covering of a type class (e.g., [9]). 

The rest of the paper is organized as follows. Section II defines the notations. Section III revisits the channel 
coding problem, and extend the dispersion result (7) to the unequal error protection (UEP) setting. Section IV 
uses this framework to prove our main JSCC dispersion result. Then Section V shows the dispersion loss of 
separation-based schemes. Finally in Section VI we consider a formulation where the distortion ratios are fixed but 
the bandwidth expansion ratio p varies with n, and apply it to the lossless JSCC dispersion problem. 

II. Notations 

This paper uses lower case letters (e.g. x) to denote a particular value of the corresponding random variable 
denoted in capital letters (e.g. X). Vectors are denoted in bold (e.g. x or X). caligraphic fonts (e.g. X) represent 
a set and V (X) for all the probability distributions on the alphabet X. We use Z+ and M_|_ to denote the set of 
non-negative integer and real numbers respectively. 

Our proofs make use of the method of types, and follow the notations in [10]. Specifically, the type of a 
sequence x with length n is denoted by P^, where the type is the empirical distribution of this sequence, i.e., 
Px{0') = N{a\x.)/n\/a € X, where N{a\x.) is the number of occurrences of a in sequence x. The subset of the 
probability distributions V (X) that can be types of n-sequences is denoted as 

p„ (X) ^{PeV{X): nP{x) G Z+, Vx € X} (9) 

and sometimes P„ is used to emphasize the fact that P„ G Vn ('^)- A type class Tp is defined as the set of 
sequences that have type Px- Given some sequence x, a sequence y of the same length has conditional type Py\^ 
if A^(a, 6|x, y) = Py|x(a|6)A^(a|x). Furthermore, the random variable corresponding to the conditional type of a 
random vector Y given x is denoted as Py|x- In addiiton, the possible conditional type given an input distribution 
Px is denoted as 

Vn {y\Px) = {Py|x : Px X Py|x eVniXxy)]. 

A discrete memoryless channel (DMC) W : X ^ y i?, defined with its input alphabet X, output alphabet 3^, 
and conditional distribution ( • | x) of output letter Y when the channel input letter X equals x ^ X. Also, we 
abbreviate (• | x) as Wx{-) for notational simplicity. We define mutual information as 



x,y 

and the channel capacity is given by 



C(H^) = max/($,VF), 
and the set of capacity-achieving distributions is n(VF) = {<!> : /($, VF) = C(W)}. 
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A discrete memoryless source (DMS) is defined with source alpliabet S, reproduction alphabet S, source 
distribution P and a distortion measure d : S x S ^ R-^. Without loss of generality, we assume that for any 
s G 5 there is s G 5 such that d{s, s) = 0. The rate-distortion function (RDF) of a DMS (5, S, P, d) is given by 

R{P,D)= mill /(P,A), 

A:Ep,Ad{S,S)<D 

where I{P, A) is the mutual information over a channel with input distribution P{S) and conditional distribution 

A discrete memoryless joint source-channel coding (JSCC) problem consists of a DMS {S,S, P,d), a DMC 
W : X ^ y and a bandwidth expansion factor p € M+. A JSCC scheme is comprised of an encoder mapping 
fj-n ■■ 5" AfL/'"J and decoder mapping gj.^ ■ 3^^''"^ ^ 5"- Given a source block s, the encoder maps it to a 
sequence x = fj-n (s) G X and transmits this sequence through the channel. The decoder receives a sequence 
y G 3^Lp"J distributed according to M^(-|x), and maps it to a source reconstruction s. The corresponding distortion 
is given by (3). 

For our analysis, we also define the following information quantities [5]: given input distribution $ and channel 
W, we define the information density of a channel as 

dW{y\x) dI{^,W) dI{^,W) 



i{x,y) = log- 



d^W{y) 



divergence variance as 



log 



dW 



dW 



[D ('^>||^)]^ 



unconditional information variance as 



U W) = Var [i{X, Y)] = V {<i> x W \\ ^ x <I>W) , 

where X x Y has joint distribution [<I> x W], conditional information variance as 

V{^,W) =E[Yar[i{X,Y)\X]] 
= V{^\\ ^W\^) 



^Hx)\y^W{y\x) 



log 



W{y\x) 
^W{y) 



n 2 



-[D {W^\\^W)fy 
and maximal/minimal conditional information variance as 

^min(t^) 



max Vi^,W), 
*Gn{VK) 



mill V{^,W). 

$en(VK) 

For simplicity, we assume all channels in this paper satisify Vmin > 0, which holds for most channels (see [5, 
Appendix H] for detailed discussion). 



f{n) 
9{n) 



< 



In this paper, we use the notation O (•), (•) and Q (•), where f{n) = O {g{n)) if and only if lim sup^_ 

oo, /(n) = n {g{n)) if and only if liminf^^oo ^ > 1, and /(n) = 6 {g{n)) if and only if f{n) = O {g{n)) and 
/(n) = {g{n)). In addition, /(n) < 0{g{n)) means that f{n) < cg{n) for some c > and sufficiently large n. 
And we use the notation poly(n) to denote a sequence of numbers that is polynomial in n, i.e., poly(n) = (n*^) 
if the polynomial has degree d. 
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III. The Dispersion of UEP Channel Coding 

In this section we introduce the dispersion of unequal error protection (UEP) coding. We use this framework in 
the next section to prove our main JSCC result, though we directly use one lemma proven here instead of the UEP 
dispersion theorem'. 

Given k classes of messages TWi, A^2) • • • ; -^fc> where \A4i\ = Ni, we can represent a message m £ A4 = UjA^j 
by its class i and content j, i.e., m = where f G {1, 2, . . . , A;} and j € {1, 2, ... , Ni}. A scheme is comprised 

of an encoding function fc-n ■ — ^ and a decoder mapping gc;n ■ 3^" -M.. The error probability for 
message m is Pq (m) = P [m / m], where m is the decoder output. We say that a scheme {fc;n, gc;n) is a UEP 
scheme with error probabilities ei, 62, . . . , and rates Ri, R2, ■ ■ ■ , Rk if 

Pe (m = < ei 

for all messages, and 

Ri = - log Ni for alH G {1, 2, . . . , fc} , 

n 

where n is the block length. We denote the codewords for message set Mj by Ai, i.e., 

A = {fcAm = ih3))J = l,2,--- ,Ni}. 

As discussed in [5], dispersion gives a meaningful characterization on the rate loss at a certain block length and 
error probability. Here, we show that similar results hold for UEP channel codes. 

Theorem 1 (UEP Dispersion, Achievability). Given a DMC (Af , y, W), a sequence of integers kn = poly(n), 
an infinite sequence of real numbers {si G (0, l),z G Z^} and an infinite sequence of (not necessarily distinct) 
distributions |$(*) G V (^) , i G Z+}, if V ($(*\ > V i , then there exists a sequence of UEP schemes with 
kn classes of messages and error probabilities Ci ^ Si such that for qU 1 ^ i ^ /^n* 

R. = l{^^\w)-^Q-He.) + o{^^). (10) 

where Vi = V (^^^^\W^ is the conditional information variance in (9). 

The following corollary is immediate, substituting types {<I>j G n(M^)}. 

Corollary 2. In the setting of Theorem 1, there exists a sequence of UEP codes with error probabilities ei < £i 
such that 



R. = Cm-^^Q-\e.) + o('^) 
V n \ n J 



where 



'^min(T^) Si < I 



Remark 1. In the theorem, the coefficient of the correction term O (log n/n) is unbounded for error probabilities 
that approach zero or one. 

Remark 2. In the theorem, the message classes are cumulative, i.e., for each codeword length n, kn message classes 
are used, which include the kn-i classes used for n — 1. Trivially, at least the same performance is achievable 
where only the message classes kn-i + 1, . . . , /c^ are used. Thus, the theorem also applies to disjoint message sets, 
as long as their size is polynomial in n. 

Remark 3. The rates of Corollary 2 are also necessary (up to the correction term). That is, any UEP code with 
error probabilities ei, 62, • • • , e^,^ such that Ci < £i must satisfy 



R.<CiW)-MQ-He.) + o('^). 

\ n \ n J 



'in this section we use n to denote the channel code block length, while in Sections IV to VI we use m = [pnj as the channel code 
block length in the JSCC setting. 
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This is straightforward to see, as Theorem 48 of [5] shows that this is a bound in the single -codebook case. 

Remark 4. When taking a single codebook, i.e. kn = 1 for all n. Corollary 2 reduces to the achievability part of 
the channel dispersion result [5, Theorem 49]. However, we have taken a slightly different path: we use constant- 
composition codebooks, resulting in the conditional information variance V {^^W), rather than Ltd. codebooks 
which result in the generally higher (worse) unconditional information variance. As discussed in [5], these quantities 
are equal when a capacity-achieving distribution is used, but a scheme achieving V ($, W) may have an advantage 
under a cost constraint. Furthermore, we feel that our approach is more insightful, since it demonstrates that the 
stochastic effect that governs the dispersion is in the channel realization only, and not in the channel input (dual 
to the source dispersion being set by the source type only). 

The proof of Theorem 1 is based on the same construction used for the UEP exponent in [2]. A decoder that 
operates based on empirical mutual information (with varying threshold according to the codebook) is used, and if 
there is a unique codeword that has high enough empirical mutual information, it is declared; otherwise an error 
will be reported. This decoding rule may introduce two types of errors: the empirical mutual information for the 
actual codeword is not high enough, or the empirical mutual information for a wrong codeword is too high. 

The following two lemmas address the effect of these error events. Lemma 3 shows that the empirical mutual 
information of the correct codeword is approximately normal distributed via the Central Limit Theorem, hence the 
probability of the first type of error (the empirical mutual information falls below the expected mutual information) 
is governed by the Q-function, from which we can obtain expression for the rate redundancy w.r.t. empirical mutual 
information. Lemma 4 shows that if we choose the codebook properly, the probability of the second type of error 
can be made negligible, relative to the probability of the first type of error. 

Lemma 3 (Rate redundancy). For a DMC {X, y, W), given a an arbitrary distribution ^ {X) with V (^>, V) > 
0, and a fixed probability e, let G Vn {X) be an n-type that approximates ^ as 

ll^-^nlloo < -. (11) 

n 

Let the rate redundancy Ai? be the infimal value such that for x € , 

P [/ (<!>„, Py|x) < I W) - Ai?, Y ~ VF" (-Ix)] = e, (12) 

then 

Furthermore, the result holds if we replace (12) with 

P[/(^>„,Py|x) </($,M^)-Ai?,Y~Ty"(-|x)] =e + 5n, (14) 

as long as 6n = O 

Proof sketch for Lemma 3: Applying Taylor expansion to the empirical mutual information /($„,Py|x)' 
where Y is the channel output corresponding to channel input x, we have 



-'Y\Ay\x)-W{y\x))l'w{y\x), 



where the higher order terms only contribute to the correction term in the desired result, and 



v=w 



dV{y\x) 

These first order terms can be represetned by sum of independent random variables with total variance V{^n, W)/n 
and finite third moment, which faciliates the application of Berry-Esseen theorem (see, e.g., [11, Ch. XVL5]) and 
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gives 

P[I($„,Py|x) <n^n,W)-AR] 
Q ( (A„ + AR) ^ 




V . 



where A„ = O (log n/n). Finally, we can show that given(ll), — and 

\I (^>, W)- I (^>„, W)\ are small enough for(13) to hold. ■ 

Lemma 4. For a DMC (Af, 3^, W), there exists a sequence of UEP codes with kn = poly(n) classes of messages, 
Ai G 7^(,), and rates i?2, . . . , Rk„, where Ri < H {^n'^ ~ Vn, 

Vn = ^ + log(n + 1) + log kn + l), (15) 

such that for any given x G Ai, z € {1, 2, ... , kn}, any x' ^ x and x' G Ai', i' £ {1,2, ... , kn}, and any 7 € M, 



i(^P,Pyi^)-R,>1,Y^W^{-\x) 



< 

+ l)\^\"\y\ exp {-n [\R,, + 7 - r?n|+ - R^'] } • 

Proof sketch for Lemma 4: This proof is based on the coding scheme in Lemma 6 of [2]. In that construction, 
given channel conditional type V, the fraction of the output sequences correspond to Ai' that overlaps with the 
output sequences of another codeword x in a message set Ai decays exponentially with the empirical mutual 
information / , V]. Then by using a decoder based on empirical mutual information and by bounding the size 



of the output sequences that cause errors for the empirical mutual information decoder, we can show the desired 
result. ■ 

The detailed proofs of Lemmas 3 and 4 are given in Appendix A-B. Below we present the proof for Theorem 1. 
Proof of Theorem I: Fix some codeword length n. Without loss of generality, assume that the message is 

= {hj) in class i, which is mapped to a channel input x{i,j) G Ai. Each codebook Ai is drawn uniformly over 
the type class of G Vn {^), where relates to (which is a general probability distribution that is not 
necessarily in Vn {X)) by 

|<I>«(x)-<I>»(x)U<i. 

n 

For any y G y^, define the measure for message m: 

am(y)=/(fi*\^'y|x(M))-^i, 

and let the decoder mapping gc-^n '■ 3^" — )• be defined as follows, using thresholds 7„ to be specified: 

m am(y) > 7n > maxm/^mam'(y) 
o.w. (declares a decoding failure) 

The error event is the union of the following two events: 

£1 = {am{y) < 7n} (16) 
£2 = {3m' ^m£M s.t an,' {y)>jn} ■ (17) 

Let m' = be a generic codeword different from m. For simplicity, we denote x{i,j) and x{i',j') by x and 

x' respectively in the rest of the proof. Note that i' may be equal to i. 

We now choose ^ 

7„ = 2ry„ + — log + - log n, (18) 
zn n 



gc;n{y) 
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where r]n is defined by (15) in Lemma 4 and a = {'i+i)/2, where d is the degree of the polynomial kn- Note that 
In = O (iog"/n). Lemma 4 shows 

P[^2] = J]P[/ (l>r,^V|x') -^i' >7n,x' G A',Y~T^(-|x) 

exp |-ninin [\Ri, + 7™ - r/„|+ - i^j/] | 
1 , , logn" 

+ 7^ log kn H 

zn n 



exp — n 

■\/ Aj7 



1 



To analyze £i, let 

Note that P [£i] may be written as 



ARi = I ( <I>^'\W] -R, 



(19) 



Now employing (14) in Lemma 3 with e = and 
we have 

V n y n 

is achievable. By the union bound, the error probabilities are no more than ei, as required. Finally, (19) leads to 



IV. Main Result: JSCC Dispersion 
We now utilize the UEP framework in Section III to arrive at our main result. 

For the sake of investigating the finite block-length behavior, we consider the excess distortion event £{D) defined 
in (2). When the distortion level is held fixed, Csiszar gives lower and upper bounds on the exponential decay of 
the excess distortion probability [3]. In this work, we fix the excess distortion probability to be constant with the 
blocklength n 



T[£m 



(20) 



and examine how the distortion thresholds Dn approach the OPTA D* (the distortion achieving equaUty in(l)), or 
equivalently, how R{P, Dn) approaches R{P, D*) = pC{W). We find that it is governed by the joint source-channel 
dispersion (8). In this formula, the source dispersion is given by [6]: 



Vs{P,D)=\^T 



d 



RiQ,D) 



Q=P 



(21) 



and the channel dispersion Vc'(VF) is given by Fmin(W^)> which is assumed to be equal to Fmax(^)- 



Theorem 5. Consider a JSCC problem with a DMS {S,S,P, d), a DMC {X , y, W) and bandwidth expansion factor 
p. Let the corresponding OPTA be D*. Assume that R{Q, D) is dijferentiable w.r.t. D and twice differentiable w.rt. 
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W) 




R{Q,D) 



R{P, D) 



Fig. 1. Heuristic view of the main JSCC excess distortion event. The ellipse denotes the approximate one-standard-deviation region of the 
source-chaimel pair, while the gray area denotes the set of source-channel realizations leading to excess distortion. 



Q in some neighborhood of {P, D*). Also assume that the channel dispersion Knin(W^) = Vtnax(W^) > 0. Then for 
a fixed excess distortion probability < e < 1, the optimal distortion thresholds Dn satisfy: 



RiP, D^)=p- C{W) - J WiMg-i (e) + o 



n ^ " 



log n 



where Vj{P,W, p) is the JSCC dispersion (8). 

We can give a heuristic explanation to this result, graphically depicted in Fig. 1 . We know that the rate needed for 
describing the source is approximately Guassian, with mean R{P,Dn) and variance Vs{P, Dn)/n. Similarly, the 
mutual information supplied by the channel is approximately Gaussian, with mean pC{W) and variance pVc{W)/n. 
We can now construct a codebook per source type, and map this set of codebooks to a channel UEP code. According 
to Section III, the dispersion of UEP given the rate of the chosen codebook is the same as only having that codebook. 
Consequently, an error occurs if the source and channel empirical behavior (Psi^y|x) is such that 

R{P,,Dn)> p-I{^,Py\^). 

The difference between the left and right hand sides is the difference of two independent approximately-Gaussian 
random variables, thus is approximately Gaussian with mean R{P, Dn) — pC and variance Vj(P, W, D), yielding 
(22) up to the correction term. However, for the proof we need to carefully consider the deviations from Gaussianity 
of both source and channel behaviors. 

Remark 5. In the (rather pathological) case where V[nm(W^) 7^ ^ax(^). we cannot draw anymore the ellipse of 
Fig. 1. This is since the variance of the channel mutual information will be different between codebooks that have 
error probability smaller or larger than 1/2. We can use V^in and Vmax/o'' upper and lower bounds on the JSCC 
dispersion. Also, when e is close to zero or one, the dispersion of the channel part is very well approximated by 
V^min or Vtnax, respectively. 

Remark 6. The source and channel dispersions are known to be the second derivatives (with respect to the rate) 
of the source exponent at rate R{P,D) and of the channel exponent at rate C(W), respectively. Interestingly, the 
JSCC dispersion (8) is also connected to the second derivative of the JSCC exponent [3]: 

E{P,W,D,p)= min [Es{P, D) + pEc{W)] 

R{P,D)<R<C 
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(where Es and Ec are the lossy source coding and sphere-packing exponents-, respectively) via 

d'^E{P, W, D, p) 



dR{P, DY 



D=D'{P,W,p) 



where in the derivative P is held fixed. 

The achievability part of Theorem 5 reUes on the following lemma. 

Lemma 6 (JSCC Distortion Redundancy). Consider a JSCC problem with a DMS {S, S, P, d), a DMC {X, y, W) 
and bandwidth expansion factor p. Let n be the length of the source block length, and let m = [pn\ be the length 
of the channel block length. Let <I> be an arbitrary distribution on X, and let G Vm{X) be an m-type that 
approximates $ as 

1 

^ - $m oo < — • 

m 

Let the channel input x G X"^ have type Further, let D*($) be the solution to R{P,D{^)) = pI{^,W). 
Assume that R(Q,D), the RDF of a source Q with the same distortion measure, is twice differentiable w.r.t. D 
and the elements of Q at some neighborhood of {P, D* (<!>)). Let e be a given probability and let Dn > be the 
infimal value s.t. 

P [R{Ps,D^) > pli^rn, ^V|x)] = e. (22) 

Then, as n grows. 



n 



+ 0['-^). (23) 



n 

In addition, for any channel input (i.e., $m is not restricted and may also depend upon the source sequence), 

RiP, D.) < pC(W) - JW]Mg-i(,) + O f i^^) , (24) 

V n \ n J 

where Vj{P,W, p) is given by (8). Furthermore, all the above holds even if replace (22) with 

P [R{Ps,Dn) > Py|x) +^=e + Cn, (25) 

for any given (vanishing) sequences (,n,Cn> as long as = O ^^^^^ and (n = O ^^^^^ 

Proof sketch for Lemma 6: Similar to Lemma 3, we apply Taylor expansion to R{Ps, Dn) and show that the 
first order term again can be expressed as sum of n independent random variables, and neglecting higher order 
terms does not affect the statement. Then R{Ps,Dn) — pl{^m, Py\x) can be shown to be the sum of n + m 
indenpdent random variables, with total variance essentially (Vs + pVc{^, W))/n. Finally, similar to the derivation 
in Lemma 3, we apply the Berry-Esseen theorem and show (23) and (25) are true. ■ 
The detailed proof of Lemma 6 is given in Appendix B-B. 

The converse part of Theorem 5 builds upon the following result, which states that for any JSCC scheme, the 
excess-distortion probability must be very high if the empirical mutual information over the channel is higher than 
the empirical source RDF. 

Lemma 7 (Joint source channel coding converse with fixed types). For a JSCC problem, given a source type 
Q G Vn {S) and a channel input type ^ G Vn (X), let G{Q, ^) be the set of source seqeuences in Tq that are 
mapped (via JSCC encoder fj-n) to channel codewords with type <I>, i.e., 

G(Q,<I>)^{sG7^:x = /j;„(s)Gr|}. 

Define all the channel outputs that covers s with distortion D as B{s, D), i.e., 

B{s, D) = {y€y^: d{s, gj-M) < D} (26) 



^Sphere-packing exponent is only achievable when R is close to C, but this is sufficient for the derivative at i? = C. 
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where m = [pn\ and gj-n is the JSCC decoder. If 

\o(Q.n> j^^^^Kl. (27) 

then for a given distortion D and a channel with constant composition conditional distribution V £ Vm{y\^), we 
have 

\v/'(f(s,))nB(si,D) 
E ' I ' ' ,1 < Pi-) exp-"I«(«'-)-A*.^)]^ (28) 



where p{n) is a polynomial that depends only on the source, channel and reconstruction alphabet sizes and p. 

The detailed proof of Lemma 7 is given in Appendix B-B. The proof uses an approach similar to that in the 
strong channel coding converse [8]. 

Below we present the proof for Theorem 5. The achievabiUty proof is based on Lemmas 4 and 6, where we do 
not use directly Lemma 3 or Theorem 1, thus we do not suffer from the non-uniformity problem (see Remark 1). 
In other words, rather than evaluating the error probability per UEP codebook, we directly evaluate the average 
over all codebooks. 

Proof: AchievabiUty: Let 

A;„ = (n + l)l'5|+i = poly(n). (29) 

At each block length n, we construct a source code C = {Cj} as follows (the index n is omitted for notational 
simplicity). Each code d corresponds to one type Qi G {Vn ('V) fl ^n), where 

^n = {Q:\\P-Q\\l<\S\^^' 
y n 

According to the refined type-covering Lemma [12], there exists codes Cj of rates 

Ri<R{Qi,Dn) + oi^], (30) 

that completely D„-cover the corresponding types (where the redundancy term is uniform). We choose these to be 
the rates of the source code. The chosen codebook and codeword indices are then communicated using a dispersion- 
optimal UEP scheme as described in Section III with a capacity-achieving channel input distribution $ € n(VF). 
Specifically, each source codebook is mapped into a channel codebook of block length [pn\ and rate 

R^ = (31) 
P 

as long as 

R^<H{^)-l^n, (32) 

where ijn is defined in Lemma 3. Otherwise, the mapping is arbitrary and we assume that an error will occur. The 
UEP scheme is thus used with different message classes at each n; such a scheme can only perform better than 
a scheme where the message classes accumulate, see Remark 2, thus we can use the results of Section III with 
number of codebooks: 

n 

E \'PnW\Jnn'\<n-\Vn{X)\< {u + 1)^^^+^ = K, 
n'=l 

where Vn i'^) is defined in (9). 

Error analysis: an excess-distortion event can occur only if one of the following events happened: 

1) Pg ^ f^n> where Pg is the type of s. 

2) Ri>H{^)-rjn. 

3) £"2(17): an unrelated channel codeword had high empirical mutual information. 

4) <Si(16): the true channel codeword had low empirical mutual information. 

We show that the first three events only contribute to the correction term. According to [6, Lemma 2], 



11 



By our assumption on the differentiabiUty of R{P,D), for large enough n the second event does not happen for 
any type in By Lemma 4, the probability of the third event is at most 0{l/^/n), uniformly. Thus, by the union 
bound, we need the probability of the last event to be at most £n = £ — 0{l/^/n). 

Now following the analysis of £i in the proof of Theorem 1,(19) indicates that event £i is equivalent to 

/(^,iV|x) <Ri + ln 

where 7^ is defined in(18).(30) and (31) indicates this is equivalent to 

pi (cD, Py|x) < RiPs, Dn) + O {^^^ ■ 

On account of Lemma 6, this can indeed be satisfied with e„ as required. 

Converse: At the first stage of the proof we suppress the dependence on the block length n for conciseness. We 
first lower-bound the excess-distortion probability given that the source type is some Q € Vn (S). 

Let a{Q, <I>) = P [Px = <&|Ps = Q] be the probability of having input type <I> giving that the source type is Q. 
Noting that given a source type, all strings within a type class are equally likely, we have 

,^ lis £r^:x = ^;„(s) grill \G{Q,^)\ 

Now we have 

F[£{D)\Ps=Q]= »{Q,'^W[£{D)\Ps = Q,Pk = <^]. 

Define the class of "frequent types" based on a{Q, 

1 



Note that 

P [^'X ^ AmPs =Q]< \Vn {X) L , < 



(„ + 1)1^1+1 - n + 1' 
thus A{Q) is nonempty. Trivially, we have: 

P [£{D)\Ps = Q]> «(^' ['^(^)l^s = Q, Px = ^] 

■i>eA(Q) 

= "(^''^) E ^[Py\^ = V\P^ = ^¥[£{D)\Ps = Q,P^ = ^,Py\^ = V] 

*eA(g) V(iV,,{y\^) 

Next we use Lemma 7 to assert, for all <I> € A{Q): 

1 ^ |rr(/(s,))nP(s„Z)) 
r [smF. P,. - >-] ^ - ^ ^ E ^^ |7^ (/(s.))| 

> 1 - p(n) exp{-n[P(g, D) - pl{^, V)]}, 
where p{n) is given in(B.77). Since J2^(^a{q)^(Q^^) — further have: 

F[£iD)\Ps=Q]>l ^ 



n + 1 

+ p{n) Y r[Py\^ = V\P^ = ^*iQ)]exp{-n[R{Q,D)-pI{^*{Q),V)]}, 
v&v„{y\^'{Q)) 

where ^*{Q) minimizes the expression over all $ G A{Q) (if there are multiple maximizers, it is chosen arbitrarily). 
Collecting all source types, we have: 

F[£{D)]>-^-p{n) Y E F[Ps = Q]P[Py|x = ^I^^x = ^*(Q)]- 

exp{-n[P(Q, D) - pIi^*{Q), V)]}. 
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At this point we return the block length index n. Let A„ be some vanishing sequence to be specified later. Define 
the set 

B{An) ^{QGVn{S),V G Vn {y\K{Q)) ■ R{Q, D) - I{K{Q)^ y) > An}. 

For any sequence A„ we can write: 

n 



¥[E{D)]>¥[B{/^n)] 



p{n) exp{-nA„} 



_n + 1 

where P [-B(A„)] = P [S : P/,,^„(s) G -B(A„)] . Now choose nA„ = (1 + p{n)) log(n + 1) to obtain: 

P [8{D)] >(l- P [5(A„)] > ^ t^^^"^^ 



' [i?(Ps, Z?) - pliKiPs), V) > A„] < £ 1 + 



Since we demand that P [£^(-D)] < e for all n, and inserting the definition of B{An) it must be that 

2 

n — 1 ^ 

Seeing that A„ = O (iog"/n), the desired result follows on account of (24) in Lemma 6. ■ 

V. The Loss of Separation 

In this section we quantify the dispersion loss of a separation-based scheme with respect to the JSCC one. Using 
the separation approach, the interface between the source and channel parts is a fixed-rate message, as opposed to 
the variable -rate interface used in conjunction with multiple quantizers and UEP, shown in this work to achieve the 
JSCC dispersion. 

Formally, we define a separation-based encoder as the concatenation of the following elements. 

1) A source encoder fs-n : 5" — )■ A^n- 

2) A source-channel mapping Mn — ^ 7W„. 

3) A channel encoder fc-n : Mn ^ S^f^K 

The interface rate is i?„ = log|7W„|/n. Finally, the source-channel mapping is randomized, in order to avoid 
"lucky" source-channel matching that leads to an effective "joint" scheme.^^ We assume that it is uniform over 
all permutations of M-n, and that it is known at the decoder as well. Consequently, the decoder is the obvious 
concatenation of elements in reversed order. The excess distortion probability of the scheme is defined as the mean 
over all permutations. 

In a separation-based scheme, an excess-distortion event occurs if one of the following: either the source coding 
results in excess distortion, or the channel coding results in a decoding error. Though it is possible that no excess 
distortion will occur when a channel error occurs (whether the source code has excess distortion or not), the 
probability of this event is exponentially small. Thus at every block-length n, the excess-distortion probability e 
satisfies 

e = £S;n * £C;n " (33) 

where a * b = a + b — ab, es-n and ec-n are the source excess-distortion probability and channel error probability, 
repectively, at blocklength n, and 5n is exponentially decaying with n. In this expression we take a fixed e, in 
accordance with the dispersion setting; the system designer is still free to choose es-n and ec-n by adjusting the 
rates as long as (33) is maintained. 

We now employ the source and channel dispersion results(6),(7), which hold up to a correction term O ('o9(")/n),'* 
to see that that for the optimal separation-based scheme: 



R{Dn) = pC{W) - min 

es;n*ec;Ti<e 



n 



n 



+ 0['^\. (34) 



n 



^For instance, the UEP scheme could be presented as a separation one if not for the randomized mapping. 

''The redundancy terms are in general functions of the error probabilities, but for probabilities bounded away from zero and one they can 
be uniformly bounded; it will become evident that for positive and finite source and channel dispersions, this is indeed the case. 
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W) 




R{Q,D) 



R{P, D) R 



Fig. 2. Main JSCC excess distortion event: the loss of separation. 



It follows, that up to the correction term it is optimal to choose fixed probabilities es-n = and ec-n = ec- 
Furthermore, the dependancy on n is the same as in the joint source-channel dispersion (23), but with different 
coefficient for the term, i.e.. 



R{Dn) = pC{W)-\^Q-'{e) + 0\^\. (35) 
V re \ n J 

Note that in the limit e 0, y/V^p = y/Vs + \/Wc- 

In order to see why separation must have a loss, consider Fig. 1. The separation scheme designer is free to 
choose the digital interface rate R. Now whenever the random source-channel pair is either to the right of the point 
{R,R) due to a source type with R{Ps,D) > R, or below it due to channel behavior I{^,Py\x) < an excess 
distortion event will occur Comparing to optimal JSCC, this adds the chessboard-pattern area on the plot. The 
designer may optimize R such that the probability of this area is minimized, but for any choice of R it will still 
have a strictly positive probability. 

For quantifying the loss, it is tempting to look at the ratio between the coefficients of the 1 /^/n terms. However, 
this ratio may be in general infinite or negative, making the comparison difficult. We choose to define the equivalent 
probability e by rewriting (35) as 



C-y3Il^<3-.(,) + of!2£!l). (36) 

V re \ n J 

Thus, e < e is the excess-distortion probability that a JSCC scheme could achieve under the same conditions, when 
the seperation scheme achieves e. Substitution reveals that 



miri; 



e(e, X) = Q 

where 



g-1 {es) + VxQ-^ {e 



VTTx 




(37) 



A^^. (38) 



In general, numerical optimization is needed in order to obtain the equivalent probability. However, clearly e{e, A) 
e{e, 1/A). In the special symmetric case A = 1 one may verify that the optimal probabilities are 

Es = Ec = l- Vl - e, 
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Fig. 3. e(e, A) as a function of e for different values of A. From bottom to top curve, A = {1, 2, 3, 5, 10, 30, 100, 1000}. 
thus 

e(e, 1) = Q (y2Q-^ (l - ^/T^)) . 
This reflects a large loss for low error probabilities. On the other hand, 

lim e(e, A) = lim e(e, A) = e. 

A— s>0 A->oo 

It seems that the symmetric case is the worst for separation, while when A grows away from 1, either the source 
or the channel behave deterministically in the scale of interest, making the JSCC problem practically a digital one, 
i.e., either source coding over a clean channel or channel coding of equi-probable messages. This is somewhat 
similar to the loss of separation in terms of excess distortion exponent. This behavior is depicted in Fig. 3. 

VI. BW Expansion and Lossless JSCC 
We now wish to change the rules, by allowing the BW expansion ratio p, which was hitherto considered constant, 
to vary with the blocklength n. More specically, we takes some sequence pn with lim„_>oo Pn = P- It is not hard 
to verify that the results of Section IV remain valid^, and p„ and Dn are related via: 

R{P,D.^ = p^Cm - M^Q-He) + O C-^] , (39) 

V n \ n J 

where for the calculation of the JSCC dispersion we use D*{P, W, p). In particular, one may choose to work with 
& fixed distortion threshold D = D* {P,W, p), and then (39) describes the convergence of the BW expansion ratio 
sequence to its limit p. 

Equipped with this, we can now formulate a meaningful lossless JSCC dispersion problem. In (nearly) lossless 
coding we demand S = S, otherwise we say that an error event £ has occurred. We can see this as a special case 
of the lossy JSCC problem with Hamming distortion: 

Al -\ / 1 Si = Si 

a(Sj, Si) = < 

I otherwise, 

^ Note that now the application of Berry-Esseen theorem is more involved, as we are now summing pnU + n independent random variables. 
However, its application still holds and results in Section IV can be proved by keeping track of p„ explicitly. 
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and with distortion threshold D = 0. While this setting does not allow for varying distortion thresholds, one may 
be interested in the number of channel uses needed to ensure a fixed error probability e, as a function of the 
blocklength n. As an immediate corollary of (39), this is given by: 



H{P) Vj{P,W,p)Q-\e) , ^flogn\ 

^" = cw+v — n — -cm^ [~^)- ^ ^ 

In lossless JSCC dispersion, the source part of Vj{P, W, p) simplifies to Var [log P], in agreement with the lossless 
source coding dispersion of Strassen [4]. 
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Appendix A 
Proofs for UEP channel coding dispersion 

In this appendix we provides proofs for results in Section III. We start by analyzing the Taylor expansion of 
empirical mutual information in Appendix A-A, which is crucial for proving Lemma 3, then we proceed to prove 
Lemmas 3 and 4 in Appendix A-B. 

A. Analysis of the empirical mutual information 

In this section, we investigate the Taylor expansion of the empirical mutual information at expected mutual 
information, i.e., 

m,PY\.) = I{^.W)+ {PY\Mx)-W{y\x))l'w{y\x) (A.41) 

+ (^Y|x(yk) - H^(2/|x))M , (A.42) 

where I'^r{y\x) = ■ Specifically, we characterize the first-order and higher-order correction terms of 

the Taylor expansion via Lemmas 8 and 10. 

Lemma 8 (First order correction term for mutual information). IfY^ {'\^)< then 

{PY\M^)-W{y\x))I'y,{y\x) = Y Y ^-'i 

x£X,y£y X 

where = {j : xj = x}, {Z^j , x E , j G JT^} are independent random variables, and for a given x, {Z^j ,j G JT^} 
are identically distributed. Furthermore, 



Y Y ^[\z^.-nzx,]f]=o(^). 



16 



Proof of Lemma 8: Note 

E {PYUv\x)-W{y\x))I'^[y\x) = Y, 

= E 

X 

= E 



. y 

P^\M^)Iw{v\x) - E W{y\x)lM, 
. y y 



Y.J^^ E [I'wiy^-) - E[i'w{Y\x)]\ . 



Let Z,,, ^ I'wiY,\x) - E[I'^{Y\x) and = j^Z.^, then E [Z,,,] = and 



Var 



''x,J 



Var = Var [l{y{Y\x)] . 



By straightforward differentiation, 



thus 



Therefore 



Var 



^x,j 



dV{y\x 
Var = $2(2;)Var 



log 



W{Y\x) 
^W{Y)_ 



E E Var[Z.,]=E E 



X j:j<aj:. 



N{x\x.) 



rVar 



Z. 



x,J 



— E <^*(a:)Var 



log 



W{Y\x) 



<^>W(Y) 



log 



Ty(y|x) 



n 



Finally, since any Z^j is discrete and finite valued variables, the sum of the absolute third moment of these variables 
is bounded by some function = © (^). ■ 
To investigate the higher order terms, we partition the channel realizations by its closeness to the true channel 
distribution W. Given input distribution we define 



7 A ^ 



x,y 



(A.43) 



where 'I'™™ = min^g^t' ^n{x)- As shown below in Lemma 9, H„ is "typical" in the sense that it contains a channel 
realization with high probability. 

Lemma 9. x E A'" has a type and Y € 3^" is the output of the channel with input x, then 

2\x\ ■ \y\ 
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Proof of Lemma 9: Let /J^ = \X\ . |3;| . ^ 



logn 1 



(a) 
< 



E {PY\Ab\a)-W{b\a)y>(3' 
aex,bey 



U {{PY\Ab\a) -W{b\a))' > j^^} 



< 



E 1 

aex,bey 

E » 



{P^^^{b\a)-W{b\a))'>ri 



m 



PY\Ab\a)-W{b\a)\ > 



(A.44) 



where (a) follows from the fact that in order for a sum of lA'HJ'l elements to be above /3^, then at least one of 
the summands must be above /3^/(|Af||3^|). (6) follows from the union bound. For any a ^ X ,b ^ y, we have 

PY^Ab\a)-W{b\a)\>-^^ 
^ {tY.=b-W{b\a)) 



i:Xi=a 

< 2 exp 



> 



-2 exp 



I'^|-I3^l , 



(A.45) 



where (a) follows from Hoeffding's inequality (see, e.g. [13, p. 191]). Applying (A.45) to each of the summands 
of (A.44) gives 



[Py|x ^ H„] < 1 

aex,bey 



< Y 2 exp 

a<^x,bey 

< 2\y\ Y exp 

< 2\X\ ■ 13^1 exp 

= 2\x\.\y\^. 



PY\Ma)-W{b\a)\>-jA=^ 
2p^n^n{a 



\x\ ■ \y\ 

2/3^n^„(a) 

\x\-\y\ 



(A.46) 



With Lemma 9, we can show that the higher order terms in(A.41) is in some sense negligible via Lemma 10. 

Lemma 10 (Second order correction term for mutual information). IfY ~ ("Ix), then exists J = J{\X\ , \y\ , P^) 
such that 



Y iPY\Ay\^)-Wiy\x)f>J 
x<^x,y&y 



log n 



n 



< 



2\x\ \y\ 



Proof of Lemma 10: Let 



j = \x\-\y\ ■ 



logn 2 
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then the lemma follows from the definition of H„ and Lemma 9. 

Finally, we show the following lemma that is useful for asymptotic analysis. 

Lemma 11. If fn = O (gn), then there exist r„ and = B (r„) such that 

P[/n>ry <p[5„>r„] 
P[/«<-r'„] <P[5n>r„] 

when n sufficiently large. 

Proof of Lemma 10: By definition there exists c > such that when n sufficiently large, 

-Cgn < fn< CQn 

Then letting = cF„ completes the proof. 

B. Proofs for UEP channel coding lemmas 

In this section we provide proofs for Lemmas 3 and 4. 

Proof for Lemma 3: We directly prove the stronger result where AR is defined according to (14). 
By Taylor expansion, we have 

I{'^,Py\.) = I{'^,W)+ Y1 {PY\Ay\x)-W{y\x))l'^{y\x) 
+ ol iPY\Ay\x)-Wiy\x))A , 



where I'wiy\x) ^ 



. Let 

v=w 



and 



then 



^(Y)= iPY\.{y\x)-Wiy\x))l{viy\c 



B{Y) = o\ Y iPY\.{y\^)-W{y\x)f], 



e + 6n = P[l {<^n, J\|x) < I W) - Ai?, Y ~ (-Ix)] 
= P [^(Y) + B{Y) < -AR, Y ~ VF" (-Ix)] 

(a) 

> P [A{Y) + F„ < -AR, Y ~ VF" (-Ix)] - P [B{Y) > F„, Y ~ W (-jx)] (A.47) 

where F„ > and (a) follows from(D.81). Similarly, (D. 81) indicates 

e + 5„ = P [A(Y) + B{Y) < -AR, Y ~ VF" (-jx)] 

< P [^(Y) - F„ < -AR, Y (-Ix)] + P [B{Y) < -F„, Y ~ 1^" (-Ix)] (A.48) 

Let F'„ = J($„, , \y\) in Lemma 10, then from Lemmas 10 and 11, there exists F„ = B (F^) = O 
such that 

P [BiY) > F„, Y ~ PF" (-Ix)] < O (^^^ , (A.49) 

P [B{Y) < -Tn, Yr^W (-Ix)] < O . (A.50) 
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In addition, based on Lemma 8 and Q{x) = 1 — Q{—x), we can apply Berry-Esseen theorem (see, e.g., [3, Ch. 
XVI.5]) and have for any — oo < A < oo. 



[^(Y) > Aa, Y ~ i-lx)] - Q (A)| < -, 



a-" 



'[A{Y) < -Aa,Y~ -Q(A)| < 



a-" 



(A.51) 
(A.52) 



where cr^ = V{^,W)/n and T is bounded by c/n^, where c is some constant. Denote V{^,W) as V, apply 
Ai = {AR + Tn)/(T and A2 = {AR - r„)/cr to(A.51) and(A.52) respectively. 



' [AiY) >AR + rn,Y^W^ (-Ix)] -Q{{AR + T^) 



n 




' [AiY] < -{AR - Tn),Y ~ W (-Ix)] -Q{iAR-T 



< 




< 



c 



Therefore, 



Q((Ai? + r„)^/- 



r iA.53) 

< P[A(Y) > Ai? + r„,Y~ W(-|x)] 



(A.47) 

< e + 5„ + P[B(Y) >r„,Y~ W^"(-|x) 
(A.49) „ /^logn 



Likewise, 



n 



Q|(Aii-r„),/-j + ^ 




c (^-54) 

> P[^(Y) < -(Ai?-r„),Y~ VF"(-|x) 



(A.48) 



0.50) 



e + 



From the smoothness of Q ^ around s, 

{AR + Tn)^>Q-^ [e + 

{AR-Tn)^<Q-^ [e + 



log n 



n 
log n 



+ 



- P [B{Y) < -r„, Y ~ VK" (-Ix) 
log n' 



c 



log n 



Therefore, 



AR>xl-Q-^{e) + 



V ^ f \ogn 
n \ \/n 



Q~He)+0 



n \ n 



AR<J-Q-\e) + \-0 
V n V n 



F ^ / log 71 



V n 



logn 



n 



and finally 



V n 



log n 



n 



(A.53) 
(A.54) 



Before proving Lemma 4, we include the following lemma [2] for completeness. 
Lemma 12 ( [2, Lemma 6]). Given X and positive integers n, kn, let 

Vn = ^ (1^1^ + log(n + 1) + log kn + iy 
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Then for arbitrary (not necessarily distinct) distributions $j G Vn {t^) cind positive integers Ni with 

-logNi<H{<^>i)-r]n, i = l,2,...,m, 
n 

there exist m disjoint sets Ai C such that 

Ai c7^^,\Ai\ = Ni, i = l,2,...,m, 

and 

\Ty (x)| < Nj exp {-n [/ V) - } /f x G A 
for every i,j and V : Af" — )• X"', except for the case i = j and V is the identity matrix. 

Proof for Lemma 4: For x' G Aj, x' 7^ x, let the joint type for the triple (x, x',y) be given as the joint 

distribution of RV's X,X',Y. Then from Lemma 12, we can find {Ai} such that Ai C and — logA'^j < 

n 

H {^i) — rjn, thus X has distribution $j and X' has distribution <l>j. In addition, define 

By = i3y(x) = {y G 7y (x) : 3x' / X such that x' G Aj and / (x Ay) - Rj > 7} , 
then the cardinaUty of UyBv is upper bounded by 

\^vSv\ <iV,exp{-n [I{X,X';Y) - H{Y\X)-r]n\} 
< Njexp^nH(Y\X) - n\l {X, X' ;¥) - ??n|^} 

Then for y G ;By, 

(y I x) = exp {-n [D {V \\ W\<^,) + H{V\^i)]} 
Note that / (x' A y) - i?j > 7 impUes / {X'; Y) - Rj > 7, and / {X, X';Y) > I {X; Y), 

I {X, X'; Y) -Rj>I {X'; Y) - Rj > -f 

Hence, 

W(^v|x) < Nj e^p \^nH{Y\X) - n\l {X, X' -Y) - r]n\'^^ exp {-n[D {V \\W\^^) + H{V\^i)]} 
= Nj exp {-n [d {V \\ W\^,) + \l {X, X' ■ y) - } 
< Nj exp {-n [D [V \\ W\^i) + \Rj + 7 - ??n|"*"] } 

And 

P [/ (x' Ay) - Rj > 7] < VF" I^U^v. x^ 

< (n + l)l'^l'l^liVj exp {-n [\Rj + 7 - 7/„|+] } 



Appendix B 
Proofs for JSCC dispersion 

This appendix contains proofs for results in Section IV. Similar to the development in Appendix A, we start 
by analyzing the Taylor expansion of the distortion-rate function in Appendix B-A, then prove the relevant key 
lemmas Appendix B-B. 
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A. Analysis of the distortion-rate function 

In this section, we investigate that Taylor expansion of R{Ps, Dn). Denote the partial derivatives of D{P,R) at 
R = J($, W) and Q = P as 



D 



R 



dD{P,R) 



D'p{s) 



dR 
dD{Q,R) 



dQ{s) 



R=I{<S>,W) 



Q=P 



Assuming D{-,-) is smooth, Taylor expansion gives 

D{Ps,pIi'^, Py|x) + Q = D{P, pI{<P, W)) 

\s\ 



+ Y,{P^is)-P{s))D'p{s) 

s=l 

+ {pI{'^,Pyi^) + C - pH'^,W))D'j, 
/ \s\ 

+ o i jyp^i^) - Pis)f + ipH'^, Py\.) + C- pH'^, w)) 

= D{P,pI{<!>,W)) 
\s\ 

+ iPsis) - Pis)) D'p{s) + pD'^ (^Y|x(y|x) - W{y\x)) I\ 



x,y 



+ B{S,Y,C), 
where ^'^ = O (logn/n), and the correction term is 

i?(S, Y, a H'nD'n + O ( 5](PY|x(y|x) - W{y\x))' 



. x,y 



\s\ 



+ O Y.^P.^') - ^(^))' + ^Y|x) + C - Pm, W)f 



s=l 



For notational simplicity, we define 

\s\ 

A{S, Y)^Y. (^s(^) - P{s)) D'p{s) + pD'p [Py\Mx) - W{y\x)) 



(B.55) 
(B.56) 



(B.57) 



(B.58) 



s=l x,y 

The lemmas in this subsection is organized as follows. Lemma 13 shows that the first order terms of the Taylor 
expansion of R{Ps,Dn) with respect to P can be represented as the sum of n i.i.d. random variables. Then 
Lemma 14 shows that ^4(8, Y) can be represented represented as the sum of n + m i.i.d. random variables. Finally, 
Lemmas 15 and 16 together with Lemmas 9 and 11 shows that the higher order terms in the Taylor expansion is 
negligible, as summarized in Lemma 17. 

Lemma 13. Under the conditions of Lemma 6, 

n 

YiPsis)-P{s))D'pis) = YZi 

s£S i=l 

where {Zi, i = 1,2, ■ ■ ■ ,n} are i.i.d. random variables such that 

= 



E 
Var 



Vd 
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where Vd = Vs ■ [D'^^f. 
Proof: 



n ^-^ 

i=l 
1 " 

= -Y,D'p{S,)-E[D'p{S)] 



se5 



1=1 



Let Zi = D'p{Si) - E[D'p{S)], then 



and 



E 



Var [D'p{Si) - E[D'p{S)]\ = Var [D'p{S)] . 
By elementary calculus it can be shown that for all s G 5, 



Therefore, 



= Var [D'p{S)] = Var [R'{S)] {D'pf = Vs ■ {D'pf . 



Lemma 14 (First order correction term for distortion-rate function). Under the conditions of Lemma 6,(B.58), i.e., 
j4(S, Y) is the sum of n + m independent random variables, whose sum of variance is 



1 

n 



p{D'i,fVs + p{D'pfV{^,W)+0 



I \2i 



logn 



n 



and sum of the absolute third moment is bounded by some constant. 

Proof for Lemma 14: According to Lemmas 8 and 13, (B.58) can be interpreted as the sum of n + m 
independent random variables. Let 0"^^ be the sum of the variance of these n + m variables, then 

2 



Vc{x) 



n 



1 



xex 



- [Vd + piD'pfV {^m,W)] 
1 



n 



p{D'rYVs + p{D'pYV{^,W) + 



logn 



n 



(B.59) 



Define r to be the sum of the absolute third moment of these variables. Since these are discrete and finite valued 
variables, r is bounded by ■:^J^, for some constant J3. 

■ 

To investigate the higher order terms, we partition the source type by its closeness to the source distribution P. 
Given source distribution P, we define 



nn{P) = {Q^Tn:\\P-Q\\l<\S 



log n 



n 



(B.60) 



In addition, we show the following property of set S„ (defined in(A.43) in Appendix A- A): 
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Lemma 15. If Py|x G H„, then 



Proof: By definition of H„, 



and therefore 



(/($,Py|,)-/($,i^))'=0 



5](Py|x(y|x)-T^(y|x))2 = 



max(Py|,(y|x) - W{y\x)f = O 



log n 



n 



log n 



n 



logn 



x,y " • \ n 

The zero-th order Taylor approximation of /(<!>, Py|x) around = Py|x is given by 



\x)-W{y\x] 



li^, W) + ( max|Py|x(2/|x) - Wiy\x) 



therefore 



(/(<!>, Py|x) - /(<!>, iy))2 = O Lax Py|x(y|x) - W{y\x) 



and the required result follows from (B. 63). 

The bounding of i3(S, Y,^^) is mainly based on the following lemma. 

Lemma 16. There exists constant J > such that 

\s\ 

E(^Y|x(yk) - W{y\x)f + ^(Psis) - P{s)f + ipli<f, Py|x) + C - pii'^, w)f > J 



(B.61) 

(B.62) 
(B.63) 

(B.64) 
(B.65) 

(B.66) 



log n 



x,y 



s=l 



< O 



n 



n 



Proof: Based on Lemma 15, we have 

151 

E(^Y|x(yk) - w{y\x)f + ^(P.is) - p{s)f + {pm, Py|x) + C - pm, w)f > J. 

x,y s=l 
<F[PS^ or Py|x ^ Sn] 
< P [Ps ^ f^n] + P [Py|x ^ S„] 

W 2|5| 2|A'| • 13^1 



< 



O 



+ 
1 



(a) follows from Lemma 9 and [6, Lemma 2]. 
Lemma 17 (Second order coiTection term for distortion-rate function). For in = O [ ), there exists T 



O 



logn 



and r„ 2 = O 



log n 



such that 



[p(s,Y,0<-r„,2] <o(^) 



(B.67) 
(B.68) 



Proof: Let T^^i = i'nD'ji + {J+ \D'j^\)\ogn/n and r„,2 = -i'nD'R + {J + I^rI) log n/n, where the J is given 
by Lemma 16, then the proof follows from Lemma 16 and Lemma 11. ■ 
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B. Proofs for JSCC lemmas 

This section first shows Lemma 6 (JSCC Distortion Redundancy Lemma), upon which proofs for both the 
achievability and converse of the main theorem builds. Then it shows the proof for Lemma 7, which is essential 
for establishing the converse result. 

Proof for Lemma 6: We directly prove the stronger result where Z)„ is defined according to (25). 

We first note that for Dn, 

P [R{Ps,Dn) > pl{^^, Py|x) +in]>e + Cn. (B.69) 
By Lemma 19, for any conditional type V, there is a constant Ji = JidA'l, |3^|) such that 

loff 771 

\i{^^,v)-mv)\<j,^, 

Therefore, 

£ + Cn<P [R{Ps,Dn) > pl{<i>m, Py|x) + U] 

< P [R{Ps,Dn) > p/(<I>,Py|x) -Ji'-^ + ^n 
= F[R{Ps,Dn)>pI{'^,PY\^) + C] 

= P [Dn < D (Ps, p/(^, Py|x) + C)] , (B.70) 

where x4 = O (log n/n). Let ADn = Dn, - D*,{BJO) now becomes 

e + Cn = P[Dn< D{Ps,pI{<^, Py|x) + Q] 
= F [ADn < AiS,Y) + B{S,Y,0] ■ 

Applying (D.78) and(D.79) gives 

£ + Cn <P [^(S, Y) + r„,i > ADn] + P [B{S, Y, > r„,i] 

e + Cn >P [^(S, Y) - Tn,2 > ADn] " P [^(S, Y, < "F^.s] 
From Lemmas 11 and 17 we have 

p[i3(s,Y,0<-rn,2] <o(^) 
p[i?(s,Y,0 >rn,i] <o(^) 

Since Cn = O we absorb the O (l/n^) terms and have: 

, + o( > P [A{S, Y) > ADn - r„,i] 



e + 0{^ ] <P[A(S,Y) > ADn + rn,2] 



Based on Lemma 14, by the (non-i.i.d. version of the) Berry-Esseen theorem ( [11, XVL5, Theorem 2]) we have 
that for any a and n, 

|P [A{S, Y)>X.an]- Q{X)\ <^ = o(^ 



where T„ is bounded by c/n^, with c being a constant. Let Ai = {ADn — r„,i)/(7 and A2 = {ADn + '^n,2)/<^, 
then, 

e + O (^-^) > Q{{ADn - Tn,i)/a) + O 
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absorbing the O (^-^^ on the right hand side, we have 



e + 0(i^) >Q((AZ)„-r„,i)/a„), 



^ ^ ^ logn + r„,2)K). 



From the smoothness of Q ^ around e, noting r„ j = O (log n/n) ,i = 1,2 and replace (7„ as in Lemma 14, we 
obtain 



Therefore, 



We add D* and apply R{P,D) to both sides of(B.71). With the Taylor approximation we have 



RiP, D^) = m W) + \l'^^^^^^^^^Q-\e)\D'^\lin + O 



^ R{P,,Dn)-in 



n \ n 

where i?'^, = Finally, note that is negative, and combined with the fact that D'j^R'^ = 1 we have the 

required 

RiP, «„) = m. w) - JyEEE^q-^ (.) + o (i5s^ 

V n \ n 

In order to establish (24), write: 

e„^P[i?(Ps,I?n) >/>/(^n^(S),PY|x)+Cn] = 1° = s] P (s) , Py|x) < ^^.(^s)] , 

where 

Tn{Ps) 

P 

Clearly, the optimal is only a function of T„(Ps)- Thus, 

en > J] P [Tn{Ps) = F Py|x) < t] ■ (B.74) 

t 

Without loss of generality we restrict the thresholds to those satisfying 

t > C{W) - O (i^) , (B.75) 

since otherwise the theorem is satisfied trivially. Now define the set 

Yi{W,5) ^{<^eV{X): 3^* G Yi{W) : \\^-<^*\\ < 6}. 
Since I{^,W) is concave in <I>, it follows that 

sup I{^,W) = C{W)-e{6) 

where e{6) > for any 6 > 0. Thus, for thresholds that satisfy (B.75) and for <I> ^ fl{W, 6) (for any choice of 
6 > 0): 

lim P [/(<!>, Py|x) <i] =1- 
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It follows that we may restrict ^mii) in(B.74) to any set 11(1^,(5) with 6 > 0. Since inside that set the Hessian 
of I{P,W) (as a function of W) can be uniformly bounded (see [5, Appendix I]), we have that(A.53) and(A.54) 
holds uniformly (i.e. with the same constant A) for all $ € Il{W,6). Consequently, 



P [I{1>„,{t), Py|x) <t]>l-Q(^{t- m^{t), W) 



n 



+ 



Since without the last correction term the probability is minimized by any ^*{W) G n(l^) and that correction 
term is uniform, we have that 

P[/(cI>„(t),PY|x) <t] > P (W^), Py|x) <i] -O(^) • 



Then, (B. 74) becomes: 

£n + 



^ {-^) ^ T.^[TniPs) = t]P [/($*W,Py|x) <t]=F [P(Ps,Dn) > (PF) , Py|x) + ^n] 

Since the O (VV^) term may be included in a ^„ sequence, it follows that one cannot do better, to the approximation 
required, then using a fixed input type ^*{W) for all source strings, resulting in (24). 



To show the converse of the JSCC problem define in Section I, we first upper bound the fraction of source 
codeword that is D-covered by a given reconstruction sequence. 

Lemma 18 (Restricted -D-ball size). Given source type P and a reconstruction sequence s, define restricted D-ball 
as 



Then 



B{s, P, D) 4 {s G : d{s, s) < D} 



\B{s,P,D)\ < {n + l)^^^\^\exp{n[H{P)- R{P,D)]} 



Proof: Let P ^Vn {S) be a given type and let Q be the type of s. Then the size of the set of source codewords 
with type P that are D-covered by s is 



|P(s, P,Z)) 



U {s G ri? : Ps,s = P X A} 



^V.[d{S,S)\<D, 
PA=Q 



Note there are at most (n + l)''^'!'^! joint types, and 

{serp: Ps,s = P X A} = 7^ (s) , 
where A is the reverse channel from 5 to 5 such that Q x A = P x A. Therefore, 

\Bis,P,D)\< Yl |'7T(^) 

A:Eq j^[d{S,S)]<D, 

< (n + exp 



Note 



hence 



n max H [A\Q 

A:EQ_A[d(5,5)]<D, 



R{P, D) = mill / (P, A) 

A:Ep,A[d{S,S)]<D 

= H{P)- max H(A\Q 

A:EQ,A[d(S,S)]<D 



\B{s,P,D)\ < {n + l)\^\\^\exp{n[H{P)- R{P,D)]} 
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Remark 7. Lemma 3 in [9], is similar to Lemma 18. However, it does not bound the size of the restricted D-ball 
uniformly, and we choose to prove Lemma 18, which is necessary for proving Lemma 7. 

Proof for Lemma 7: In our proof, we first bound the denominator in(28) uniformly for all Sj, and then bound 
the sum of the numerator over all Sj, as done in [8] for the channel error exponent. 

a) Bounding the denominator: Based on standard results in method of types [14], for /(s) G TJ, 

'■^ll^l exp {mi? (F|$)} < |7v"(/(s))| 

Hence 

1 - mm 



<{m+ iy-^\\y\ exp {-mH {V\^)} 



b) Bounding the sum of numerator: Note that since s G G(Q,<I>), 

yeTv (/(s)) n ^(s, D)^se B(5J;n(y), Q, D) n G(Q, $), (B.76) 
hence any y will be counted at most \B{gj-n{y)^ Q, D) H G{Q, ^)\ times. According to Lemma 18, this is upper 
bounded by Bu = {n + exp {n [H{Q) - R{Q, D)]} . In addition, it is obvious that 

U Tvif{si))nBisi,D)cT^, 

s.eG(Q,*) 

where ^' = is the channel output distribution corresponding to <I>. Therefore, 



1 



\G{Q,'^) 



\Tv(J{si))nB{si,D) 



s.eG(Q,<I>) 



'q 



U Tvifisi))nB{s„D) 



s.eT; 



Q 



(n+l)W+i 

^ -Dti I /"I- I ■ 



-T-'n 
In 



Noting 



we have 



{n + l)-^^^exp{nH{Q)}<\TS\ 



\T^\<exp{mHi^)}, 



n 



log 



(n + 1)1^1+1 



In 



Bu \ T^\ 



< ^^±i log(n + 1) + ^ log(n + l)-H (Q) 

n n 



+ 



n 

<pH{^)-R{Q,D) 



\og{n + l) + H{Q)-R{Q,D) 



+ 



n 



\X\ + 1 151 

log(n + 1) H log(n + 1) H log(n + 1) 

n n 



Combining the bounds for both numerator and denominator, we have 



1 



n 



■log 



1 



+ 



'^113^1 



E 



\Tv{f{si))r^B{suD) 



s,6G(Q,<I>) 
< pH{^) - pH {V\^) - R{Q, D) 



Tvifisi)) 



m 



log(m + 1) + 



S 



n 
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log(n + 1) H log(n + 1) H log(n + 1) 

n n 
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logn 



n 



Note m = \_pn\ < pn, let 

Pin) = (pn+l)H^liy|(n+l)«[(l'5||'5|){|^l+i){|5|)]^ (B_77) 
and the proof is completed. ■ 

Appendix C 

Continuity of the mutual information function 

In this section we show the continuity of the mutual information function, which shows that for investigation in 
dispersion, arguments based on types is essentially the same as arguments based on general probability distributions. 

Lemma 19. For P,QgV [X), if \\P - Q||oo < -5 < 1/(2 \X\ \y\), then 

\I{P,W)- I{Q,W)\ < 5|;\^|log|3^| - 13^1 \X\5\og\X\5. 

Therefore, when 5 = (^), 

\I{P,W)-I{Q,W)\=0 
Proof: Let Py = [P x W]y and Qy = [Q W]y , note 

\\Py -QY\\i<5\x\\y\. 

Let 5' = \X\ 6, then Lemma 1.2.7 in [10] shows, 

\I {P, W)-I [Q, W)\ = \{H (Py) - H {W\P)) - {H {Qy) - H {W\Q))\ 

< \{H {Py) - H {Qy))\ + \{H {W\P) - H {W\Q))\ 

< -\y\5' \og5' + 5\x\\og\y\ 

= 5\X\\og\y\-\y\\X\5\og\X\5. 



Appendix D 
Elementary Probability Inequalities 

In this section we prove several simple probability inequalities used in our derivation. 

Lemma 20. Let A and B be two (generally dependent) random variables and let c be a constant. Then for any 
values ri,r2,r3,r4, the following holds: 

W>[A + B > c]<¥[A> c-Ti]+¥[B >Ti], (D.78) 

¥[A + B> c]>W[A> c + T2]-f[B < -V2] , (D.79) 

F[A + B <c\<F[A<c + T3]+F[B < -T3] , (D.80) 

F[A + B <c]>F[A<c-T4]-F[B >T4]. (D.81) 

Proof: To show (D.78), let £a = {A > c - Ti}, £3 = {B > Ti}, and £ = {A + B > c}. Note that 



hence by De Morgan's law, 

We prove (D.78) by the union bound 



■[£]<F[£a] + F[£b] 



Apply (D.78) on —A, —B, —c and we obtain (D.79) after rearrangement. 

Subtract 1 from both sides of(D.79) and replace r2 by T^, we obtain (D.80) after rearrangement. 

Apply (D.79) on —A, —B, —c and we obtain obtain(D.81) after rearrangement. 
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